Skip to content

Conversation

lessw2020
Copy link
Contributor

Adds option to do torch profile tracing via:
--run_profiler (T/F)
--profile_folder (str)

Traces are saved out with rank_X as part of the trace name.
rank_named_traces

Implemented as context wrapper around the main training loop.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2024
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for the quick progress! I think this looks great, have a few comments about how to trigger the profiler inlined.

…rofiling.py file, global dumps folder, logging_utils.py
@lessw2020
Copy link
Contributor Author

pr is updated to address the previous feedback.
adds user config control for profiling via train_config.toml,
separate profiling.py file, global dumps folder, logging_utils.py.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, only have one comment about profiling flags

…filing, create custom named folders for traces
@wanchaol wanchaol merged commit f1e86e4 into pytorch:main Jan 19, 2024
lessw2020 added a commit that referenced this pull request Apr 18, 2024
Adds option to do torch profile tracing via:
--run_profiler  (T/F)
--profile_folder (str) 

Traces are saved out with rank_X as part of the trace name.
<img width="1711" alt="rank_named_traces"
src="https://github.com/pytorch-labs/torchtrain/assets/46302957/6eb3c3e0-6034-4d1f-8ea8-f43988755714">

Implemented as context wrapper around the main training loop.
jinsun-yoo pushed a commit to jinsun-yoo/torchtitan that referenced this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants